Issues in pre- and post-translation document expansion: untranslatable cognates and missegmented words
نویسنده
چکیده
Query expansion by pseudo-relevance feedback is a well-established technique in both monoand crosslingual information retrieval, enriching and disambiguating the typically terse queries provided by searchers. Comparable document-side expansion is a relatively more recent development motivated by error-prone transcription and translation processes in spoken document and cross-language retrieval. In the cross-language case, one can perform expansion before translation, after translation, and at both points. We investigate the relative impact of preand posttranslation document expansion for cross-language spoken document retrieval in Mandarin Chinese. We find that posttranslation expansion yields a highly significant improvement in retrieval effectiveness, while improvements due to pretranslation expansion alone or in combination do not reach significance. We identify two key factors of segmentation and translation in Chinese orthography that limit the effectiveness of pre-translation expansion in the Chinese-English case, while post-translation expansion yields its full benefit.
منابع مشابه
Automatic Detection of Orthographics Cues for Cognate Recognition
Present-day machine translation technologies crucially depend on the size and quality of lexical resources. Much of recent research in the area has been concerned with methods to build bilingual dictionaries automatically. In this paper we propose a methodology for the automatic detection of cognates between two languages based solely on the orthography of words. From a set of known cognates, t...
متن کاملUniversity of Chicago at NTCIR4 CLIR: Multi-Scale Query Expansion
Pseudo-relevance feedback, while useful in monolingual applications for refining and enriching short user queries, proves even more important in crosslanguage information retrieval (CLIR). For CLIR, query expansion before and after translation can provide an opportunity to recover from translation gaps, reduce ambiguity, and enhance recall. Furthermore, for CLIR in unsegmented Asian languages, ...
متن کاملTranslation Studies: Pre-Discipline, Discipline, Interdiscipline, and Post-Discipline
In the West, Translation Studies as a discipline has a very short but lively history. Founded in the early 1970s in the Low Countries—Holland and Belgium—translation studies is a fairly new field. Yet, today some theorists suggest that the discipline is too limited to translated texts and excludes much translation data being generated from other fields of inquiry, including theater, art, archit...
متن کاملMental Representation of Cognates/Noncognates in Persian-Speaking EFL Learners
The purpose of this study was to investigate the mental representation of cognate and noncognate translation pairs in languages with different scripts to test the prediction of dual lexicon model (Gollan, Forster, & Frost, 1997). Two groups of Persian-speaking English language learners were tested on cognate and noncognate translation pairs in Persian-English and English-Persian directions with...
متن کاملA new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کامل